TSIS is an R package for detecting transcript isoform switch for time-series data. Transcript isoform switch occurs when a pair of isoforms reverse the order of expression levels as shown in Figure 1. TSIS characterizes the transcript switch by 1) defining the isoform switch time points for any pair of transcript isoforms within a gene, 2) describing the switch using 5 different features, 3) filtering the results with user’s specifications and 4) visualizing the results using different plots for the user to examine further details of the switches. All the functions are available in the forms of a graphic interface implemented by Shiny App (a web application framework for R) (Chang, et al., 2016), in which users can implement the analysis as easy as mouse click. The tool can also be run just in command lines without graphic interface. This tutorial will cover both in the following sections.
\(iso_i\) and \(iso_j\). The points in the plots represent the samples and the black lines are the average of samples. (A) is the iso-kTSP algorithm for comparisons of two conditions \(c_1\) and \(c_2\). The iso-kTSP is extended to time-series isoform switch (TSIS) in figure (B). The time-series with 6 time points is divided into 4 intervals by the intersection points of average expression. Five features for switch evaluation are determined based on the intervals before and after switch, e.g. the before and after intervals adjoined to switch point \(P_i\).
Given that a pair of isoforms \(iso_i\) and \(iso_j\) may have a number of switches in a time-series, we have offered two approaches to search for the switch time points in TSIS:
It is reasonable to assume the isoform expression and time-series show curvilinear relationship. However, explicit average values of expression loss precision without having information of backward and forward time points. The spline method fit time-series of expression with control points (depending on spline degree of freedom provided) and weights of several neighbours to obtained designed precision (Hastie and Tibshirani, 1990). The spline method is useful to find global trends of time-series when the data is very noisy. But it may sacrifice the local details of switch. For example, a rough spline fitting with very few control points may results in big shifts of switch points. Users can use both average and spline method to search for the switch points and determine optimal output by looking at the switch plots.
We define each transcript isoform switch by 1) the switch point \(P_i\) , 2) time points between switch points \(P_{i-1}\) and \(P_i\) as interval before switch \(P_i\) and 3) time points between switch points \(P_i\) and \(P_{i+1}\) as interval after the switch \(P_i\) (see Figure 1(B)). We defined 5 measurements to score each isoform switch. The first two are the probability/frequency of switch and the sum of average sample differences before and after switch, which are similar to Score 1 and Score 2 in iso-kTSP method (Sebestyen, et al., 2015) (see Figure 1(A))). For Score 2, instead of rank differences as in iso-kTSP to avoid possible ties, we directly use the average sample differences.
Due to an issue with devtools, if R software is installed in a directory whose name has space character in it, e.g. in “C:\Program Files”, users may get error message “‘C:\Program’ is not recognized as an internal or external command”. This issue has to be solved by making sure that R is installed in a directory whose name has no space characters. Users can check the R installation location by typing
R.home()
install.packages(c(“shiny”, “shinythemes”,“ggplot2”,“plotly”,“zoo”,“gtools”,“devtools”), dependencies=TRUE)
Install TSIS package from Github using devtools package.
library(devtools)
devtools::install_github("wyguo/TSIS")
Once installed, TSIS package can be loaded as normal.
library(TSIS)
The TSIS package provides the example datasets “AtRTD2” with 2,666 genes and 6,307 isoforms, analysed in 26 time points, each with 3 biological replicates and 3 technical replicates. The experiments were designed to investigate the Arabidopsis gene expression response to cold. The isoform expression is in TPM (transcript per million) format. For the experiments and data quantification details, please see the AtRTD2 paper (Zhang, et al.,2016). Other type of transcript quantifications, such as read counts, Percentage Splicing Ins (PSIs) can also be used in TSIS.
The data loaded into the Shiny App must be in *.csv format. Users can download the example datasets from https://github.com/wyguo/TSIS/tree/master/data or by typing the following codes in R console:
AtRTD2.example()
The data will be saved in a folder “example data” in the working directory. Figure 3 shows the examples of input data in csv format.
To make the implement more user friendly, TSIS analysis is integrated into a Shiny App (Chang, et al., 2016). By typing
TSIS.app()
in R console after loading TSIS package, the App is opened in the default web browser. Users can upload input datasets, set parameters for switch analysis, visualize and save the results as easy as mouse click. The TSIS App includes three tab panels (see Figure 2(A)).
The first tab panel includes this user manual.
There are four sections in this panel (see Figure 2).
Figure 2: Second tab panel in TSIS Shiny App. (A) is the three tab panels of the app; (B) is the data input interface; (C) is the interface for TSIS parameter setting; (D) provides the density/frequency plots of isoform switch time and (E) shows the output of TSIS analysis.
Three *.csv format input files can be provided for TSIS analysis.
Figure 3: The format of input csv files for (A) transcript isoform expression, (B) two column table of gene-isoform mapping and (C) one column of a subset of isoform names.
Figure 2(B) and Figure 4(A) shows the data input interface for time-series isoform expression and gene-isoform mapping. By clicking the “Browse…” button, a window is open for data loading (see Figure 4(B)). Users can use the interface shown in Figure 4(C) to load the names of subset of isoforms.
Figure 4: Interface for input information.
The section in Figure 2(C) and Figure 5 is used to set the parameters for TSIS. The parameters can be set by selecting or typing in corresponding boxes. Scoring process is starting by clicking the “Scoring” button. The parameter setting details are in the text followed the scoring button (see Figure 5(A)). Processing tacking bars for time-series intersection points searching and switch scoring (Figure 5(B)) for the isoform pairs will present in the bottom of the browser.
Figure 5(C) is the interface for scoring feature filtering. Users can set cut-offs, such as for the probability/frequency of switch and sum of average differences, to further refine the switch results. The parameter setting details are in the text under the “Filtering” button.
Figure 5: TSIS parameter setting section. (A) is the scoring parameter input interface; (B) is the processing tracking bars and (C) is the switch score filtering interface.
The isoform switches occur at different time points in the time-series. To visualize the frequency and density plot of switch time, TSIS Shiny App provides the plot interface as shown in Figure 6. Frequency and density bar plots and line plots, which correspond to isoform switch time points after scoring and filtering processes, will present by clicking the corresponding radio buttons. The plot can be saved in html, pdf and png format.
Note: The plot is made by using plotly R package. Users can move the mouse around the plot to show plot values and select part of the plot to zoom out. More actions are available by using the tool bar in the top right corner of the plot.
Figure 6: Switch time density and frequency plot interface.
The output table of TSIS analysisafter scoring or filtering. The columns include the information of isoform names, isoform ratios to genes, the intervals before and after switch, the coordinates of switch points and five scores of switch quality. Table columns can be sorted by clicking the small triangles beside the column names and contents can be searched by typing text in the search box. The explanations for each column are on the top of the table (see Figure 7).
Figure 7: The output switch measurements table.
Figure 8: The third tab panel of TSIS Shiny App. (A) is the switch plot section by providing a pair of isoform names. (B) is used to save top \(n\) plot into a local folder.
Any pair of switched transcript isoforms can be visualized by providing their names. Plot type options are error bar plot and ribbon plot (see functions geom_errorbar and geom_smooth in ggplot2 package for details) as shown in Figure 8(A) and example plots of AT5G60930 in Figure 9 and Figure 10. An option is provided to only label the features of switch points with probability/frequency of switch>cut-off in the time region for investigation. The plots can be saved in html (plotly format plot), png or pdf format.
Transcript isoform switch profiles can be plotted in batch by selecting top n (ranking with Score 1 probability/frequency of switch) pairs of isoforms into png or pdf format plots (see Figure 8(B)).
In addition to the Shiny App, users can use scripts to do TSIS analysis in R console. The following examples show a step-by-step tutorial of the analysis. Please refer to the function details using help function, e.g. help(iso.switch) or ?iso.switch.
##load the data
library(TSIS)
data.exp<-AtRTD2$data.exp
mapping<-AtRTD2$mapping
dim(data.exp);dim(mapping)
Example 1: search intersection points with mean expression
##Scores
scores.mean2int<-iso.switch(data.exp=data.exp,mapping =mapping,
t.start=1,t.end=26,nrep=9,rank=F,
min.t.points =2,min.difference=1,spline =F,
spline.df = 9,verbose = F)
Example 2: search intersection points with spline method
##Scores, set spline=T and define spline degree of freedom to spline.df=9
scores.spline2int<-iso.switch(data.exp=data.exp,mapping =mapping,
t.start=1,t.end=26,nrep=9,rank=F,
min.t.points =2,min.difference=1,spline =T,
spline.df = 10,verbose = F)
Example 1: general filtering with cut-offs
##intersection from mean expression
scores.mean2int.filtered<-score.filter(
scores = scores.mean2int,prob.cutoff = 0.5,diff.cutoff = 1,
t.points.cutoff = 2,pval.cutoff = 0.01, cor.cutoff = 0.5,
data.exp = NULL,mapping = NULL,sub.isoform.list = NULL,
sub.isoform = F,max.ratio = F,x.value.limit = c(9,17)
)
scores.mean2int.filtered[1:5,]
##intersection from spline method
scores.spline2int.filtered<-score.filter(
scores = scores.spline2int,prob.cutoff = 0.5,
diff.cutoff = 1,t.points.cutoff = 2,pval.cutoff = 0.01,
cor.cutoff = 0.5,data.exp = NULL,mapping = NULL,
sub.isoform.list = NULL,sub.isoform = F,max.ratio = F,
x.value.limit = c(9,17)
)
Example 2: only show subset of results according to an isoform list
##intersection from mean expression
##input a list of isoform names for investigation.
sub.isoform.list<-AtRTD2$sub.isoforms
sub.isoform.list[1:10]
##assign the isoform name list to sub.isoform.list and set sub.isoform=TRUE
scores.mean2int.filtered.subset<-score.filter(
scores = scores.mean2int,prob.cutoff = 0.5,diff.cutoff = 1,
t.points.cutoff = 2,pval.cutoff = 0.01, cor.cutoff = 0.5,
data.exp = NULL,mapping = NULL,sub.isoform.list = sub.isoform.list,
sub.isoform = T,max.ratio = F,x.value.limit = c(9,17)
)
Example 3: only show results of the most abundant transcript within a gene
scores.mean2int.filtered.maxratio<-score.filter(
scores = scores.mean2int,prob.cutoff = 0.5,diff.cutoff = 1,
t.points.cutoff = 2,pval.cutoff = 0.01, cor.cutoff = 0,
data.exp = data.exp,mapping = mapping,sub.isoform.list = NULL,
sub.isoform = F,max.ratio = T,x.value.limit = c(9,17)
)
plotTSIS(data2plot = data.exp,scores = scores.mean2int.filtered,
iso1 = 'AT5G60930_P2',iso2 = ' AT5G60930_P3',gene.name = NULL,
y.lab = 'Expression',make.plotly = F,
t.start = 1,t.end = 26,nrep = 9,prob.cutoff = 0.5,
x.lower.boundary = 9,x.upper.boundary = 17,
show.region = T,show.scores = T,
line.width =0.5,point.size = 3,
error.type = 'stderr',show.errorbar = T,errorbar.size = 0.5,
errorbar.width = 0.2,spline = F,spline.df = NULL,ribbon.plot = F)
plotTSIS(data2plot = data.exp,scores = scores.mean2int.filtered,
iso1 = 'AT5G60930_P2',iso2 = ' AT5G60930_P3',gene.name = NULL,
y.lab = 'Expression',make.plotly = F,
t.start = 1,t.end = 26,nrep = 9,prob.cutoff = 0.5,
x.lower.boundary = 9,x.upper.boundary = 17,
show.region = T,show.scores = T,
line.width =0.5,point.size = 3,
error.type = 'stderr',show.errorbar = T,errorbar.size = 0.5,
errorbar.width = 0.2,spline = F,spline.df = NULL,ribbon.plot = T)
Chang, W., et al. 2016. shiny: Web Application Framework for R. https://CRAN.R-project.org/package=shiny
Hastie, T.J. and Tibshirani, R.J. Generalized additive models. Chapter 7 of Statistical Models in S eds. Wadsworth & Brooks/Cole 1990.
Sebestyen, E., Zawisza, M. and Eyras, E. Detection of recurrent alternative splicing switches in tumor samples reveals novel signatures of cancer. Nucleic Acids Res 2015;43(3):1345-1356.
Zhang, R., et al. AtRTD2: A Reference Transcript Dataset for accurate quantification of alternative splicing and expression changes in Arabidopsis thaliana RNA-seq data. bioRxiv 2016.
## R version 3.3.1 (2016-06-21)
## Platform: x86_64-w64-mingw32/x64 (64-bit)
## Running under: Windows 7 x64 (build 7601) Service Pack 1
##
## locale:
## [1] LC_COLLATE=English_United Kingdom.1252
## [2] LC_CTYPE=English_United Kingdom.1252
## [3] LC_MONETARY=English_United Kingdom.1252
## [4] LC_NUMERIC=C
## [5] LC_TIME=English_United Kingdom.1252
##
## attached base packages:
## [1] stats graphics grDevices utils datasets methods base
##
## other attached packages:
## [1] BiocStyle_2.3.30 devtools_1.12.0 TSIS_0.1.0
##
## loaded via a namespace (and not attached):
## [1] Rcpp_0.12.9 codetools_0.2-14 digest_0.6.10 withr_1.0.2
## [5] rprojroot_1.2 backports_1.0.5 magrittr_1.5 evaluate_0.10
## [9] stringi_1.1.1 rmarkdown_1.3 tools_3.3.1 stringr_1.1.0
## [13] yaml_2.1.14 memoise_1.0.0 htmltools_0.3.5 knitr_1.15.1